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Abstract 


It has been shown in the past that a multistart hillclimbing strategy compares favourably 
to a standard genetic algorithm with respect to solving instances of the multimodal problem 
generator. We extend that work and verify if the utilization of diversity preservation techniques 
in the genetic algorithm changes the outcome of the comparison. We do so under two scenarios: 

(1) when the goal is to find the global optimum, (2) when the goal is to find all optima. 

A mathematical analysis is performed for the multistart hillclimbing algorithm and a through 
empirical study is conducted for solving instances of the multimodal problem generator with 
increasing number of optima, both with the hillclimbing strategy as well as with genetic algo¬ 
rithms with niching. Although niching improves the performance of the genetic algorithm, it is 
still inferior to the multistart hillclimbing strategy on this class of problems. 

An idealized niching strategy is also presented and it is argued that its performance should 
be close to a lower bound of what any evolutionary algorithm can do on this class of problems. 

1 Introduction 

One often hears that one of the advantages of evolutionary algorithms (EAs) with respect to hill¬ 
climbing algorithms is their ability to escape local optima. By using a population of solutions, EAs 
can explore multiple basins of attraction simultaneously, and together with diversity preservation 
techniques, EAs can potentially maintain a diverse set of solutions and be able to locate multiple 
optima in a single run. Hillclimbers, however, can be restarted over and over, and in some cases 
such a strategy should not be dismissed. 

In this paper we look at instances of the multimodal problem generator, a class of artificial 
problems originally proposed in [2] and used by several researchers in subsequent studies. It has been 
hinted before that a multistart hillclimbing strategy outperforms standard evolutionary algorithms 
on this class of problems [6]. However, that study only considered EAs without niching. Herein we 
investigate if the incorporation of niching in an EA changes the outcome of the comparison. 

The paper is organised as follows. The next section describes the multimodal problem generator. 
Section [3] presents a mathematical analysis of multistart next ascent hillclimbing on this class of 
problems and confirms it with computer simulations. Section |4] and [5] assess if the incorporation 
of niching in genetic algorithms allows them to be competitive with the multistart hillclimbing 
strategy. In Section [6l an idealized niching strategy for this class of problems is suggested and 
analysed. Finally, Section [3 summarises and presents the major conclusions of this work. 
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2 The multimodal problem generator 


The generator creates problem instances with a certain number of peaks (the degree of multi¬ 
modality). For a problem with n peaks, n bit-strings of length L are randomly generated. Each 
of these strings is a peak (a local optimum) in the landscape. Different heights can be assigned 
to different peaks. To evaluate an arbitrary individual x, first locate its nearest peak in Hamming 
space, 


nearest{x) = axgm!ni{HammingDist{x,Peaki)). 

i=l..n 

The fitness of x is the number of bits the string has in common with its nearest peak, divided 
by L, and scaled by the height of the nearest peak. 


L — HamminqDistix, 

fix) = - - - Height{Peaknearest{x))- 

In this paper we will be using equal-height as well as unequal-height peaks. For the unequal 
case, peak heights are linearly interpolated between a minimum value of 0.5 and a maximum value 
of 1.0. Also, when locating the nearest peak to a string x, ties are broken uniformly at random. 

The difficulty of the problem depends on the number of peaks, the distribution of peak heights, 
and on the distribution of the peak locations. 

3 Multistart next ascent hillclimbing and its analysis 

We applied a multistart next ascent hillclimbing algorithm to solve instances of the multimodal 
problem generator. Starting from a random solution, the algorithm climbs up the peak using next 
ascent hillclimbing (NAHC). Once there, it restarts from another random solution, and keeps doing 
that over and over until some other stopping criterion is reached. 

NAHC explores the neighbourhood of the current solution in a randomly generated order. As 
soon as a neighbour s is found with better fitness than the current solution, that neighbour s 
becomes the current solution. This process is repeated until no neighbour improves upon the 
current solution. In this paper we consider the neighbourhood of a string x to be the set of strings 
whose Hamming distance to x is 1. 

Our NAHC implementation takes care not to explore the neighbour of a string x in case that 
neighbour was the previous current solution, because that solution could not possibly yield an 
improvement upon x. For completeness, we present pseudocode of NAHC in Algorithm [H 

3.1 Analysis of NAHC on a single peak 

Let XnahciL) d) be a random variable denoting the number of fitness evaluations needed by NAHC 
to find the optimum on a single peak problem instance assuming the starting string x of length L 
is d bits away from the top of the peak. The expected value of XnahdL-, d) is given by Equation [TJ 


E[Xnahc{L,d)] 


j d—1 
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Algorithm 1: NAHC 
Input : A problem instance 
Output: A local optimum 
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X i — random solution 
evaluate (x) 
prev i — NULL 
repeat 

hilltop i — true 

foreach s G neighbourhood{x) \ {prev} do 
evaluate(s) 

if s.fitness > x.fitness then 
hilltop i — false 
prev i — x 
x <— s 

break 

end 

end 

until hilltop 
return x 


The first term in Equation [T] corresponds to the evaluation of the starting string. The second 
term, L/d, is the expected number of bit flips needed to get an improvement in fitness when the 
neighbourhood of x is explored for the first time (the first time line 6 of Algorithm [1] is executed.) 
Since x is d bits away from the optimum, the probability of getting an improvement on a single bit 
flip is d/L. The expected number of bit flips for an improvement is the inverse of that, L/d. The 
remaining terms are the expected number of bit flips needed for a fitness improvement when the 
current string x is at distance 1, 2, ..., d — 1, from the optimum. These terms have L — 1 instead 
of L in the numerator because the previous current solution is not evaluated. 

The Hamming distance between two randomly generated bit-strings of length L follows a Bino¬ 
mial distribution with L Bernoulli independent trials, each with success probability p = 1/2. The 
expected distance d between the starting string and the top of the peak is L/2. Substituting this 
value in Equation [1] gives, 


L/2-1 

E[X^ahciL,d = L/2)]=3 + iL-l)- - 

k=l 

<3 + (L-l)-(l + lg(L/2-l)) 

= 0{LlgL). (2) 

3.2 Analysis of multistart NAHC on multiple peak instances 

We analyse the running time of multistart next ascent hillclimbing (MS-NAHC) when solving an 
instance of the multimodal problem generated with n equal height peaks. The analysis assumes 
that all peaks are equally likely to be reached on a single execution of NAHC. This assumption is 
reasonable when considering equal-height randomly distributed peaks. 
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3.2.1 Find a particular peak 


Let Xpgak be a random variable denoting the number of times NAHC is called within the multistart 
strategy in order to reach the top of a particular peak, say Peaki. The variable Xp^ak follows a 
Geometric distribution with success probability p = Ijn and its expected value is 1/p = n. 


E[Xpeak] = n. (3) 

Let Xjns-nahc{L,n) be a random variable denoting the number of fitness evaluations needed 
by MS-NAHC to find the top of a particular peak of an n-peak problem instance of length L. 
To obtain Xms-nahc{L, n) we need to multiply the expected number of NAHC executions by the 
expected number of evaluations to climb to the top of a peak. The former is given by Equation [3] 
and the latter by Equation [H The only thing missing is the expected value of the distance d of the 
starting string to the top of the peak, needed by Equation [TJ 

For a single-peak problem the expected value of d is L/2. For a larger number of peaks, however, 
the expected value of d is less than that because we need to consider the first order statistic of 
a set of n independent samples of the Binomial distribution. This is so because according to the 
definition of the multimodal problem generator we first locate the nearest peak among the set of n 
peaks. We shall look into the order statistics later on. For now, we assume a conservative value of 
d = Ll‘1 to obtain an asymptotic upper bound on the expected number of fitness evaluations. 


P[Xms— nahc(.k-i, — E[Xpgai^^ • d)] 

< E[Xpeak] • E[Xnahc{L, d = L/2)] 

= 0{nLlgL). (4) 


3.2.2 Find all peaks 

We do a similar analysis for the case when the goal of the algorithm is to reach to top of all peaks. 
Let Xaii be a random variable denoting the number of times NAHC is called within the multistart 
strategy in order to reach the top of all peaks. The expected value of Xaii is given by. 


E[Xau]=n-'^l/k = 0{n\gn). (5) 

k=l 

The reasoning is as follows. Suppose we have already found k < n peaks. The probability of 
reaching an unfound peak on a single execution of NAHC is (n — k)/n. The number of NAHC 
executions to find such a peak follows again a Geometric distribution with success probability 
(n — k)/n, and has the expected value n/(n — k). Summing the expected values for k ranging from 
0 to n — 1 yields n/{n — k) = n ■ l/k. 

Similarly to the previous case, let Xms-nahc-aii{L-, n) be a random variable denoting the number 
of fitness evaluations needed by MS-NAHC to find the top of all peaks on an n-peak problem 
instance of length L. 


E\XjYis—nahc—au{^L^n)] — • E]Xinahc(,Ld)] 

<E[Xall]-E[Xnahc{L,d = L/2)\ 

= 0{n\gn L\gL). (6) 
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3.2.3 Order statistics 


We use a conservative value of d = -L/2 in Equations H] and [H A better estimate can be obtained 
by considering the first order statistic (i.e, the minimum) of a set of n independent samples of the 
Binomial distribution. For n sufficiently large, the Binomial distribution can be approximated to 
the Normal distribution with mean fj, = np and standard deviation a = ■\/np(l — p). Royston [9] 
quotes the following formula by Blom [1] as a good approximation to the expected value E[Xr-.n\ 
of the r-th order statistic of a set of n samples of a Normal distributed random variable X with 
mean p and standard deviaton a, 


E[Xr..n] ^ p + W, ) (7) 

\n — la + 1/ 

with a = 0.375 and being the inverse of the cumulative distribution function of the standard 
Normal distribution. Using r = 1 and a = 0.375 as suggested by Blom, we can obtain values for 
the first order statistic as needed. 

3.3 Experimental verification 

We applied the MS-NAHC algorithm to randomly generated instances of the multimodal problem 
generator with 20, 40, 80, 160, and 320 peaks. We use a fixed string length L of 100 bits on all 
problems. 100 independent runs were performed for each problem instance. We did two kinds of 
experiments with instances containing (1) equal-height peaks, and (2) unequal-height peaks with 
the heights linearly interpolated between 0.5 and 1.0. For each set of experiments, we use 2 different 
stopping criteria; (a) reach the top of Peaki, (b) reach the top of all peaks. Together, this gives 4 
sets of experiments. 

For the equal-height peak problem instances, the results are shown in Table 1. The theoretical 
values shown in the tables for the expected value of the number of NAHC calls were obtained 
from Equations [3] and [H Likewise, the expected number of fitness evaluations was obtained by 
Equations 0] and [H but rather than using the conservative value of d = L/2 we use order statistics 
to get a better estimate of d. Table 2 shows the values obtained using approximation from formula^ 
for the first order statistic of a standard Normal distribution for sample sizes n = 20,40, 80,160,320, 
along with the expected distance d to the top of the nearest peak when using a string length L = 100. 
Recall that with L = 100, the Binomial distribution arising from the distance between randomly 
generated bit-strings is approximated to a Normal distribution with mean p = L/2 = 50 and 
a = y^L/4 = 5. The theoretical prediction matches well with the experimental results. 

For the unequal-height peak problem instances, the results are shown in Table 3. In this case, 
finding the highest fit peak is the target goal for the first set of experiments; for the second set of 
experiments the goal is to find all the peaks. 

For these cases the theory shown presented in Section 13.21 is not applicable because the assump¬ 
tion that all peaks are equally likely to be reached by NAHC does not hold; higher fit peaks are 
more likely to be reached. This occurs because in the early stages of hillclimbing there’s a good 
chance that a solution can jump to a basin of attraction of a higher fit peak. Since higher fit peaks 
are more likely to be reached, it should be expected that MS-NAHC needs on average fewer restarts 
in order to find the best peak (as opposed to any other peak). Likewise, we should expect a higher 
number of restarts in order to find all peaks when compared to the equal-height peak scenario 
because the lower height peaks are more difficult to obtain. This hypothesis is confirmed by the 
experimental results. 
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Table 1: Average number of NAHC calls and average number of fitness evaluations (over 100 
independent runs) needed by MS-NAHC, along with expected theoretical values, on equal-height 
peak problem instances for two target goals: (1) find a particular peak (e.g., Peaki), (2) find all 
peaks. 


Equal-height instances 

Goal 

n 

NAHC calls 

theory 

evals 

theory 


20 

21.2 

20 

9198 

8540 


40 

37.4 

40 

16032 

16885 

Peaki 

80 

82.4 

80 

34967 

33567 


160 

166.9 

160 

70519 

66718 


320 

342.6 

320 

143227 

131699 


20 

75.7 

72.0 

32708 

30726 


40 

174.6 

171.1 

74765 

72243 

All Peaks 

80 

411.2 

397.2 

174552 

166677 


160 

921.3 

904.9 

388356 

377321 


320 

2094.9 

2031.1 

877035 

835908 


Table 2: First order statistic Xi-n of the standard Normal distribution for various sample sizes, 
along with the expected distance d to the top of the nearest of n peaks on a 100 bit problem. 


n 

20 

40 

80 

160 

320 

Xl-,n 

-1.87 

-2.16 

-2.43 

-2.66 

-2.91 

E[d] 

40.65 

39.20 

37.85 

36.70 

35.45 


Table 3: Average number of NAHC calls and average number of fitness evaluations (over 100 
independent runs) needed by MS-NAHC, on unequal-height peak problem instances for two target 
goals: (1) find the best peak, (2) find all peaks. 


Unequal-height instances 

Goal 

n 

NAHC calls 

evals 


20 

14.4 

6206 


40 

23.8 

10189 

Best Peak 

80 

37.1 

15749 


160 

83.1 

35156 


320 

171.5 

71864 


20 

83.3 

36378 


40 

212.7 

91135 

All Peaks 

80 

557.4 

237199 


160 

1360.6 

574617 


320 

3230.4 

1355815 
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4 Genetic algorithms with niching 


A standard genetic algorithm (sGA) without using diversity preservation techniques is not able to 
reliably reach the top of the best peak on instances of the multimodal problem generator unless very 
large population sizes are used [6] . The cited work showed empirically that a multistart hillclimbing 
algorithm is more efficient than the sGA for solving instances of this class of problems. 

Diversity preservation techniques, commonly referred as niching, have been investigated for 
many years. They are especially useful for solving multimodal optimization problems. By main¬ 
taining diversity in a population of solutions it is expected that the GA can maintain basins of 
attraction of several optima for long periods of time allowing it to obtain multiple optimal or 
near-optimal solutions in a single run. 

Several niching techniques have been proposed in the literature. Here we explore restricted 
tournament selection (RTS) [3], a crowding-like mechanism that has been shown to be quite effective 
in practice. RTS incorporates the notion of local competition within a steady-state GA forcing a 
new individual to compete with an existing population member that is similar to it. RTS works as 
follows. Two solutions are drawn uniformly at random from the population, call them A and B. 
These two solutions undergo crossover and mutation giving rise to two new solutions, A' and B'. 
Then, for each new solution [A' and B'), scan w individuals randomly chosen from the population 
and pick the one that is most similar to it. Call them A" and B”. A' competes with A". If A' is 
better than A” it replaces A” in the population, otherwise A' is discarded. A similar competition 
takes place between B' and B”. w is a parameter of RTS often referred as the window size. 

RTS does not have a mating restriction mechanism preventing solutions from different basins 
of attraction to mate with each other. On instances of the multimodal problem generator, where 
peaks are randomly generated, crossing solutions near the top of distinct peaks is likely to produce 
so-called lethal solutions which are far from the top of any peak. As observed in [6], crossover is 
only beneficial in these problem instances when it crosses solutions near the same peak. To address 
this issue, we implemented a mating restriction mechanism on top of RTS. We name the resulting 
method as RTS-MR. As opposed to RTS, only one solution is randomly chosen from the population, 
call it A. Ideally A should mate with a solution that is not to far away from it, i.e., a solution in 
the same basin of attraction. The obvious way to achieve that is to implement the same method 
employed by RTS for finding a not dissimilar individual to compete with, and use it for the mating 
phase as well. As such, instead of picking the second solution B at random, we scan w individuals 
at random from the population and pick the one that is most similar (but whose distance to it is 
at least 2 bits) to mate with it. The 2-bit minimum distance restriction is used because crossing 
two bit strings whose Hamming distance is less than 2 always produces children identical to the 
parents, regardless of the crossover operator used. The remaining part of the algorithm is exactly 
the same as in RTS. In other words, RTS-MR implements both a mating restriction policy as well 
as a competition/replacement restriction. 

We also tested a third algorithm that uses mutation alone and has a replacement strategy that 
enforces a crowding-like mechanism when mutation rates are low. The algorithm is very simple. A 
solution A is drawn at random from the population. That solution undergoes mutation yielding a 
new solution A'. Then A' competes with A and whichever is best is allowed to stay in the population. 
With a low mutation rate, A and A' should be similar to each other, and the competition between 
them enforces a crowding mechanism, just like in RTS. We name this algorithm (^; 1 -|- 1)-EA, due 
to its resemblance to the classical (1-1-1) and {n + 1) EAs. 
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(d) 160 peaks 




number of peaks 


(e) 320 peaks 


(f) avg. evaluations 


Figure 1: Unequal-height peak instances, (a)-(e) box-and-whisker plots for the number of fitness 
evaluations needed to reach the top of the best peak, for instances with n = 20,40,80,160, 320 
peaks, (f) shows the average number of fitness evaluations for those same instances. 


5 Experiments 

We tested the three algorithms described in the previous section, RTS, RTS-MR, and (/r; 1-|-1)-EA, 
on the same instances of the multimodal problem generator presented in Section 13.31 Again we did 
two sets of experiments, with equal and unequal height peaks, but for each set of experiments we 
only used one target goal. For unequal-height peaks the target goal was to reach the top of the 
highest fit peak, the global optimum. For equal-height peaks the target goal was to reach the top 
of all the peaks. 

Our implementation of RTS, RTS-MR, and (//; 1 -|- 1)-EA, was done in such a way that new 
individuals were only evaluated if that was absolutely required. Whenever a newly created indi¬ 
vidual was identical to one of the parent individuals, no fitness evaluation was spent. Similarly, 
during the local competition on RTS and RTS-MR, if the two competing individuals are identical, 
no fitness evaluation is spent. We did our best to use near-optimal parameter settings for the three 
EAs so that they could perform as best as possible. 

Uniform crossover was used on all experiments. On instances of the multimodal problem gen¬ 
erator, crossover is only effective when crossing strings in the same basin of attraction. In such 
cases it is as if the problem had only a single peak and that would make it equivalent to the clas- 
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(a) 20 peaks 


(b) 40 peaks 


(c) 80 peaks 





(d) 160 peaks 


(e) 320 peaks 


(f) avg. evaluations 


Figure 2: Equal-height peak instances, (a)-(e) box-and-whisker plots for the number of fitness 
evaluations needed to reach the top of all peak, for instances with n = 20,40, 80,160,320 peaks, 
(f) shows the average number of fitness evaluations for those same instances. 


sical onemax problem for which uniform crossover provides better mixing and faster convergence. 
We tested three crossover rates Pc = 0.0,0.5,0.8 and used bit-flip mutation with probability 1/L. 
For RTS and RTS-MR, the window size w was set to 4 times the number of peaks following the 
recommendations given in [4] or to the population size, whichever was minimum. 

With respect to population sizing we used the bisection method to obtain the minimum popu¬ 
lation size that allows the algorithms to reach the target goal on 100/100 independent runs. The 
bisection method was first proposed by Sastry in |10] and can be used in controlled experiments 
when the target goal is known in advance. The method works as follows. Starting from a very 
small population size, it keeps doubling it until a certain quality criterion is reached on a sequence 
of independent runs. Once such a large enough population size is found, call it N, we know that 
the minimum population size required is between low = N/2 and high = N. The bisection method 
then tries a population size in the middle of the two, mid = {high -|- low)/2, and refines the low or 
high bounds accordingly depending on the outcome of the sequence of independent runs performed 
with the new size mid. If those runs fulfill the desired quality criterion, high becomes mid, other¬ 
wise low becomes mid. This process is repeated until high and low are within a certain threshold 
of each other. At that point the value high is returned as the method’s guess for the minimum 
population size that ensures the desired quality criterion. The sequence of runs performed with 
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that population size are then used for measuring the performance of the algorithm. 

The bisection method is usually repeated a number of times. We did 30 independent repetitions, 
yielding a total of 30 * 100 = 3000 runs, per algorithm and per problem instance, that were used 
to measure the performance. On all runs we imposed a limit of 5 million fitness evaluations, upon 
which we considered the run to be unsuccessful. This limit is more than 3 times larger than the 
number of evaluations needed by the worst of the 100 independent runs of MS-NAHC when solving 
the most difficult instance: reaching the top of all peaks on a 320 equal-height peak instance. 

Figures [T] and [2] show the results obtained for increasing number of peaks. Figured] is for the 
unequal-height case, and the target goal is to reach the top of the best peak. Figure [2] is for 
instances with equal-height peaks, and here the target goal is to reach the top of all peaks. 

Let’s interpret the results, starting with the unequal-height instances. First observe that the 
results of MS-NAHC exhibit a high variability. This is natural because the algorithm can get lucky 
and find the best peak quickly but can also get unlucky and need a lot of restarts. Nonetheless, by 
looking at the box part of the plots in Figure [Ila)-(e), which represents the 1st and 3rd quartiles 
of the distribution, one can clearly observe that MS-NAHC needs less function evaluations than 
any other algorithm. Among the others, RTS-MR is the second best. This is especially noticeable 
for larger number of peaks (without much difference with respect to the crossover rate of 0.5 or 
0.8). RTS and (//; 1 -|- 1)-EA are the worst ones. Notice that the results of Pc = 0.0 are only 
reported for RTS and not for RTS-MR. The reason is that without crossover, RTS-MR are RTS 
become identical. Figure d^f) reports the average number of fitness evaluations needed by the 
various algorithms for increasing number of peaks, all in a single plot. Measures of distribution 
dispersion shown previously in Figure d](a)-(e) are not shown in (f) because the plot would become 
too cluttered. 

Let us now look at the results for equal-height instances in Figure [2] Here the target goal is to 
reach the top of all peaks. Again, MS-NAHC is by far the best performing algorithm. (//; 1 -|- 1)-EA 
is now the second best, RTS-MR the third best (again with no much difference with respect to 
the crossover rate.) RTS is the worst one and needs substantial computation effort to reach the 
target goal. For the 40 peak problem instance, RTS was unable to reach the goal within 5 million 
function evaluations. For instances with 80 or more peaks, only MS-NAHC and (//; 1 -|- 1)-EA were 
able to reach the target goal within 5 million function evaluations. Similar to the previous case. 
Figure [2Kf) reports the average number of fitness evaluations needed by the various algorithms for 
increasing number of peaks. 

The reason why (/r; 1 -|- 1)-EA came as a second best algorithm for the equal-height instances 
can be easily explained. (^; 1 -|- 1)-EA is essentially a parallel stochastic hillclimber. It is using a 
population and all of its members are climbing to the top of their nearest peak, and do so more or 
less in parallel. As opposed to that, MS-NAHC finds the top of the peaks sequentially, one after the 
other. The reason why MS-NAHC is faster is because NAHC is also faster than (l-l-l)-EA when 
reaching the top of a single peak, although both have the same asymptotic time complexity. As 
an aside, an interesting and related discussion on the topic of sequential and parallel niching can 
be found in Mahfoud’s work [7], even though his notion of sequential niching involves changing the 
fitness landscape once a peak is found, something that our MS-NAHC is not doing here. 

6 An idealized niching strategy 

The results presented previously showed the superiority of the MS-NAHC on this class of problems. 
As pointed out previously, it seems that the nature of the instances of the multimodal problem 
generator are more easily solvable by a multistart hillclimbing strategy than by a evolutionary 
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algorithm. If that was already clear for EAs without niching [6], the results presented in this paper 
confirm the superiority of multistart hillclimbing, even with EAs with niching. We tried hard to 
come up with an EA that could beat the hillclimbing strategy but were unable to do so, even with 
careful tuning of the EAs’ parameters. It seems challenging to design an EA that could be more 
efficient than the multistart hilliclimbing strategy on this class of problems. We cannot rule out the 
existence of such an EA though. In what follows, we present an idealized niching algorithm that in 
some sense is the best that any niching algorithm could possibly achieve for solving instances from 
this class of problems. 

When solving an n-peak instance, supposed that at the end of each generation the idealized 
niching algorithm is able to cluster the population into n sub-populations, each containing the 
solutions that are at the basin of attraction of a particular peak. In other words, the idealized 
algorithm knows where the niches are. This is of course an unrealistic assumption to hold in practice, 
but that’s precisely the reason why we call it an idealized algorithm. Under this assumption, the 
idealized algorithm could restrict selection and mating to occur only within members of each sub¬ 
population, in effect being equivalent to having n separate EAs. 

A simple analysis suggests that such an algorithm can be faster that the MS-NAHC when the 
goal is to reach the top of the peaks. Clearly the population size has to be Q{nlgn) because it has 
to contain at least one solution at the basin of attraction of every peak. The nlgn bound comes 
from the same analysis performed in section [3.2.21 when calculating the expected number of NAHC 
calls for reaching the top of all peaks. On the other hand, convergence theory developed in [8] 
says that the number of generations until convergence for a genetic algorithm on a problem like 
onemax is 0(\/L) for sufficiently large population sizes. Since climbing up a single peak in these 
instances is equivalent to solving onemax, the theory should be applicable and a total running time 
of 0{nlgny/L) should hold for the idealized niching algorithm. This compares favourably with 
the 0(nlgn LlgL) bound derived earlier for the running time of MS-NAHC (see Equation [6]) . 
To verify this hypothesis we implemented the idealized algorithm using the nearest peak of each 
population member to perform the clustering into sub-populations. We set the population size to 
20 * n Ign. This ensures that for all problem instances, every peak has on expectation at least 20 
solutions in its basin of attraction. We conjecture that this should be enough for not losing basins 
of attraction of any peak through the entire run. Within each sub-population, each generation used 
tournament selection of size 2, uniform crossover with = 0.8, 1/L bit-flip mutation rate, and a 
50% worst replacement strategy to maintain some sort or elitism. Experimental results over 100 
independent runs confirm that the idealized niching algorithm compares favourably with respect 
to the MS-NAHC algorithm (see Figure [H) 

The only caveat is that the idealized niching algorithm is not realistic in practice; it is assuming 
that it knows where the niches are, which is what all niching algorithms are trying to learn during 
the search itself. Nonetheless, the idealized algorithm can be seen as a lower bound for the best 
running time that any EA can possibly achieve to solve this kind of problem instances. 

7 Summary and Conclusions 

This paper presented a mathematical analysis of multistart next ascent hillclimbing for solving 
instances of the multimodal problem generator and confirmed the analyses through computer sim¬ 
ulations. It also showed that conventional niching and mating restriction techniques incorporated 
in an EA were not sufficient to make them competitive with the hillclimbing strategy, even though 
they were useful and improved the performance of the EAs substantially when compared with EAs 
without niching as used in [^. The paper also presented an idealized niching strategy for this type 
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number of peaks 


Figure 3: Average number of fitness evaluations of the Idealized niching algorithm vs. MS-NAHC, 
over 100 independent runs, needed to reach the top of all n peaks on equal-height peak instances. 


of instances whose performance should be very close to asymptotically optimal for this class of 
problems. 

We conjecture that instances of the multimodal problem generator are in some sense the easiest 
of all multimodal optimization problems. The reason for this claim is that even a simple multistart 
hillclimbing strategy is able to solve these problems, apparently faster than any practical EA. 
This observation make us think that this class of problems is suitable for theoretical studies in 
multimodal optimization, in the same spirit that the onemax problem has been extensively analysed 
in the Evolutionary Computation literature. 

The multistart hillclimbing strategy can only solve relatively simple problems. Eor example, it 
would fail miserably in the massive multimodal problem [3] where the global optima are surrounded 
by millions of local optima. In many other fitness landscapes, the various optima are related to 
each other in some way. In such cases, restarting the search from scratch whenever a local optimum 
is found is certainly not the best strategy. 

Einally, in the same spirit of what others have advocated in the past mm], the results presented 
in this paper suggest that hillclimbing strategies shouldn’t be easily dismissed and should be used 
as a baseline method when comparing algorithms, even in the case of multimodal optimization. 
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